Stop and Frisk in New York City from 2017 - 2019

Text Analysis
R
Python

An analysis of stop and frisk plicy in NYC using R. This article is based off a project initially completed in python.

Roupen Khanjian true
03-14-2021
Code
# Packages 
library(tidyverse) # Easily Install and Load the 'Tidyverse', CRAN v1.3.0
library(patchwork) # The Composer of Plots, CRAN v1.1.1
library(here) # A Simpler Way to Find Your Files, CRAN v1.0.1
library(janitor) # Simple Tools for Examining and Cleaning Dirty Data, CRAN v2.1.0
library(tidytext) # Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools, CRAN v0.3.0
library(textdata) # Download and Load Various Text Datasets, CRAN v0.4.1
library(vader) # Valence Aware Dictionary and sEntiment Reasoner (VADER), CRAN v0.2.1
library(readxl) # Read Excel Files, CRAN v1.3.1
library(lubridate) # Make Dealing with Dates a Little Easier, CRAN v1.7.10
library(gt) # Easily Create Presentation-Ready Display Tables, CRAN v0.2.2
library(kableExtra) # Construct Complex Table with 'kable' and Pipe Syntax, CRAN v1.3.4
library(knitr) # A General-Purpose Package for Dynamic Report Generation in R, CRAN v1.31
library(plotly) # Create Interactive Web Graphics via 'plotly.js', CRAN v4.9.3

Introduction

This article was originally completed in python here: link

During the past 3 weeks, protests have taken place all over the country and the world in response to the murder of George Floyd by the Minneapolis Police. Since the advent of video recordings via cell phones, there have been many instances of police brutality captured against Black people. The relationship between law enforcement and the Black community has always been tenuous and now it has been brought to the attention of the rest of the world. In the landmark Supreme court case of Terry v. Ohio (1968) it was ruled that police could stop, question, and frisk a person if they have reasonable suspicion that the person had committed a crime (Brandes, S.A. et al., 2019).

In the 2000’s to the early 2010’s the New York City (NYC) stop and frisk policy garnered national attention due to the high number of stops and profiling of Black people. At the height of the policy, in 2011 there were 658,724 stops recorded with over 50% of the stops targeting Black people (NYCLU 2019). Since then, the number of stops per year has substantially decreased to 13,459 stop in 2019. Opponents of this policy argue this is still too many stops, especially since in 2019 about 66% of the people stopped were innocent. It has been also shown that the stopping of white people more likely led to an arrest in comparison to Black and Hispanic people, implying the police may be targeting minorities and being more mindful of stopping white people (Gelman, A., et. al., 2007). There has been substantial research conducted showing the psychological distress of a stop and frisk policy on communities of color in NYC (Sewell, A. et al., 2016).

In this project I analyze the Stop, Question and Frisk Data from the New York Police Department (NYPD) from the most current three years: 2017, 2018, and 2019 (NYC Stop and Frisk Data). I chose these years for the following reasons: The years 2018-2019 was not included in the most recent NYCLU report. In 2017 the NYPD moved to an electronic form, as opposed to manually writing down a response for each question in the handwritten forms used prior to 2017. Lastly 2017 was the first year of the Trump presidency and I was curious to investigate if his rhetoric on race may have affected law enforcements behavior toward minorities.

Data was downloaded from the NYPD website link. In each dataset each row is a stop of a specific person, and each column is a variable. There are a total of 83 different variables in each dataset.

Code
# Read in data
# file_names <- list.files(here("_texts", 
#                               "stop_frisk_R", 
#                               "data")) # file names for each episode
# 
# frisk <- str_glue(here("_texts","stop_frisk_R", 
#                               "data/"),"{file_names}") %>% 
#   map_dfr(read_excel) # read in all the episodes into one data frame!

saf_17 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2017.xlsx"))

saf_18 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2018.xlsx"))

saf_19 <- read_excel(here("_texts", "stop_frisk_R", "data",
                          "sqf_2019.xlsx"))

# fix date column and select columns to be used for analysis
saf_17 <- saf_17 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_18 <- saf_18 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select(
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

saf_19 <- saf_19 %>%
  mutate(STOP_FRISK_DATE = ymd(STOP_FRISK_DATE)) %>%
  select( # select for the years 
    c(
      "STOP_FRISK_DATE",
      "STOP_FRISK_TIME",
      "YEAR2",
      "MONTH2",
      "DAY2",
      "ISSUING_OFFICER_RANK",
      "SUSPECTED_CRIME_DESCRIPTION",
      "SUSPECT_ARRESTED_FLAG",
      "SUSPECT_ARREST_OFFENSE",
      "OFFICER_IN_UNIFORM_FLAG",
      "FRISKED_FLAG",
      "SEARCHED_FLAG",
      "OTHER_CONTRABAND_FLAG",
      "FIREARM_FLAG",
      "KNIFE_CUTTER_FLAG",
      "OTHER_WEAPON_FLAG",
      "WEAPON_FOUND_FLAG",
      "PHYSICAL_FORCE_CEW_FLAG",
      "PHYSICAL_FORCE_DRAW_POINT_FIREARM_FLAG",
      "PHYSICAL_FORCE_HANDCUFF_SUSPECT_FLAG",
      "PHYSICAL_FORCE_OC_SPRAY_USED_FLAG",
      "PHYSICAL_FORCE_OTHER_FLAG",
      "PHYSICAL_FORCE_RESTRAINT_USED_FLAG",
      "PHYSICAL_FORCE_VERBAL_INSTRUCTION_FLAG",
      "PHYSICAL_FORCE_WEAPON_IMPACT_FLAG",
      "SUSPECTS_ACTIONS_CASING_FLAG",
      "SUSPECTS_ACTIONS_PROXIMITY_TO_SCENE_FLAG",
      "DEMEANOR_OF_PERSON_STOPPED",
      "SUSPECT_REPORTED_AGE",
      "SUSPECT_SEX",
      "SUSPECT_RACE_DESCRIPTION",
      "SUSPECT_BODY_BUILD_TYPE",
      "SUSPECT_OTHER_DESCRIPTION",
      "STOP_LOCATION_PRECINCT",
      "STOP_LOCATION_FULL_ADDRESS",
      "STOP_LOCATION_STREET_NAME",
      "STOP_LOCATION_PATROL_BORO_NAME",
      "STOP_LOCATION_BORO_NAME",
      "SUSPECTS_ACTIONS_DRUG_TRANSACTIONS_FLAG"
    )
  )

# row bind all 3 years into one data frame
saf <- rbind(saf_17, saf_18, saf_19) 

# clean column names
saf <- saf %>% 
  clean_names()

# view first 5 rows
head(saf, n = 10) %>% 
  kbl(caption = "<b style = 'color:black;'>
       First ten rows of stop and frisk dataset.") %>%
  kable_material_dark(bootstrap_options = c("striped", "hover")) %>%
  row_spec(0, color = "white", background = "#222222") %>%
  scroll_box(width = "100%", height = "300px", 
             fixed_thead = list(enabled = T, background = "#222222"))
Table 1: First ten rows of stop and frisk dataset.
stop_frisk_date stop_frisk_time year2 month2 day2 issuing_officer_rank suspected_crime_description suspect_arrested_flag suspect_arrest_offense officer_in_uniform_flag frisked_flag searched_flag other_contraband_flag firearm_flag knife_cutter_flag other_weapon_flag weapon_found_flag physical_force_cew_flag physical_force_draw_point_firearm_flag physical_force_handcuff_suspect_flag physical_force_oc_spray_used_flag physical_force_other_flag physical_force_restraint_used_flag physical_force_verbal_instruction_flag physical_force_weapon_impact_flag suspects_actions_casing_flag suspects_actions_proximity_to_scene_flag demeanor_of_person_stopped suspect_reported_age suspect_sex suspect_race_description suspect_body_build_type suspect_other_description stop_location_precinct stop_location_full_address stop_location_street_name stop_location_patrol_boro_name stop_location_boro_name suspects_actions_drug_transactions_flag
2017-01-16 1899-12-31 14:26:00 2017 January Monday SGT TERRORISM N (null) Y N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) TERRORISM 39 MALE (null) THN (null) 1 180 GREENWICH STREET GREENWICH STREET PBMS MANHATTAN (null)
2017-01-16 1899-12-31 14:26:00 2017 January Monday SGT TERRORISM N (null) Y N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) TERRORISM 37 MALE (null) MED (null) 1 180 GREENWICH STREET GREENWICH STREET PBMS MANHATTAN (null)
2017-02-08 1899-12-31 11:10:00 2017 February Wednesday POM OTHER N (null) N N N N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) OTHER (null) FEMALE WHITE THN N/A 1 WALL STREET && BROADWAY WALL STREET PBMS MANHATTAN (null)
2017-02-20 1899-12-31 11:35:00 2017 February Monday POM GRAND LARCENY AUTO N (null) Y Y Y N (null) (null) (null) N (null) (null) Y (null) (null) (null) (null) (null) (null) (null) GRAND LARCENY AUTO 31 MALE BLACK HISPANIC U UNK 1 75 GREENE STREET GREENE STREET PBMS MANHATTAN (null)
2017-02-21 1899-12-31 13:20:00 2017 February Tuesday POM BURGLARY N (null) Y N N N (null) (null) (null) N (null) (null) Y (null) (null) (null) (null) (null) (null) Y BURGLARY (null) FEMALE BLACK THN (null) 1 429 WEST BROADWAY WEST BROADWAY PBMS MANHATTAN (null)
2017-02-17 1899-12-31 21:25:00 2017 February Friday POM CPW Y MENACING Y Y Y N (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) CPW 39 MALE WHITE HISPANIC MED (null) 1 WEST STREET && CHAMBERS STREET WEST STREET PBMS MANHATTAN (null)
2017-02-25 1899-12-31 20:00:00 2017 February Saturday POM CPW N (null) Y N N N (null) (null) (null) N (null) (null) Y (null) (null) Y Y (null) (null) (null) CPW 19 FEMALE BLACK THN (null) 1 CHAMBERS STREET && WEST BROADWAY CHAMBERS STREET PBMS MANHATTAN (null)
2017-02-25 1899-12-31 19:58:00 2017 February Saturday POM CPW N (null) Y Y Y N (null) (null) (null) N (null) (null) (null) (null) (null) Y Y (null) (null) Y NORMAL 15 FEMALE BLACK THN (null) 1 CHAMBERS STREET && WEST BROADWAY CHAMBERS STREET PBMS MANHATTAN (null)
2017-02-21 1899-12-31 13:15:00 2017 February Tuesday POM BURGLARY N (null) Y Y Y N (null) (null) (null) N (null) (null) Y (null) (null) (null) Y (null) (null) Y PLEASANT 43 MALE BLACK HEA (null) 1 429 WEST BROADWAY WEST BROADWAY PBMS MANHATTAN (null)
2017-03-03 1899-12-31 08:16:00 2017 March Friday POM CRIMINAL MISCHIEF Y CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE Y Y Y Y (null) (null) (null) N (null) (null) (null) (null) (null) (null) Y (null) (null) (null) NERVOUS 33 MALE BLACK THN (null) 1 CORTLANDT STREET && CHURCH STREET CORTLANDT STREET PBMS MANHATTAN (null)
Code
nrow(saf)
[1] 36096
Code
saf %>% 
  count(suspect_race_description) %>% 
  summarise(race = suspect_race_description,
            n = n,
            percent = n / sum(n)) %>% 
  gt() %>% 
  tab_header(
    title = "Which Race was Stop and Frisked the Most?",
    subtitle = "Data from 2017-2019"
  ) %>% 
  fmt_percent( 
    columns = vars(percent),
    decimals = 2
    ) %>% 
  data_color(
    columns = vars(n, percent),
    colors = scales::col_numeric(
      palette = c(
        "lightskyblue", "dodgerblue", "royalblue4") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(style = "oblique"),
      cell_borders(
        side = c("right"), 
        color = "black",
        weight = px(2)
        )
    ),
    locations = cells_body(
      columns = 1
    )
    )
Which Race was Stop and Frisked the Most?
Data from 2017-2019
race n percent
(null) 420 1.16%
AMER IND 9 0.02%
AMERICAN INDIAN/ALASKAN N 8 0.02%
AMERICAN INDIAN/ALASKAN NATIVE 16 0.04%
ASIAN / PACIFIC ISLANDER 522 1.45%
ASIAN/PAC.ISL 206 0.57%
BLACK 20817 57.67%
BLACK HISPANIC 3102 8.59%
MALE 7 0.02%
WHITE 3266 9.05%
WHITE HISPANIC 7723 21.40%

In the above table we see that the people stopped were mostly described as have one of the following four races: Black, White Hispanic, White, or Black Hispanic.

Data Wrangling and Cleaning

Code
saf_sub <- saf %>% 
  # filter for race in one of the four most saf 
  filter(suspect_race_description %in% c("BLACK", 
                                         "WHITE",
                                         "WHITE HISPANIC",
                                         "BLACK HISPANIC")) %>% 
  # Fill in NA values for nonsense descriptions
  # also change misspellings
  mutate(suspect_other_description = case_when(
    suspect_other_description == "(null)" ~ NA_character_,
    suspect_other_description == "NONE" ~ NA_character_,
    suspect_other_description == "UNK" ~ NA_character_,
    suspect_other_description == "UNKNOWN" ~ NA_character_,
    suspect_other_description == "UNKNOW" ~ NA_character_,
    suspect_other_description == "NO" ~ NA_character_,
    suspect_other_description == "UKNOWN" ~ NA_character_,
    suspect_other_description == "UNKOWN" ~ NA_character_,
    suspect_other_description == "NA" ~ NA_character_,
    suspect_other_description == "N/A" ~ NA_character_,
    suspect_other_description == "0" ~ NA_character_,
    suspect_other_description == 0 ~ NA_character_,
    suspect_other_description == "TATOOS" ~ "TATTOOS",
    suspect_other_description == "TATTOS" ~ "TATTOOS",
    suspect_other_description == "TATOO" ~ "TATTOOS",
    TRUE ~ as.character(suspect_other_description)
  )) %>% 
  # Change unknown sex values to NA
  mutate(suspect_sex = case_when(
    suspect_sex == "(null)" ~ NA_character_,
    suspect_sex == "19" ~ NA_character_,
    suspect_sex == "23" ~ NA_character_,
    suspect_sex == "24" ~ NA_character_,
    suspect_sex == "39" ~ NA_character_,
    TRUE ~ as.character(suspect_sex)
  )) %>% 
  # Change (null) value to No (N) (need to find a better way to do this step)
  mutate(physical_force_cew_flag = case_when(
    physical_force_cew_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_cew_flag)
  )) %>% 
  mutate(physical_force_draw_point_firearm_flag = case_when(
    physical_force_draw_point_firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_draw_point_firearm_flag)
  )) %>% 
  mutate(physical_force_handcuff_suspect_flag = case_when(
    physical_force_handcuff_suspect_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_handcuff_suspect_flag)
  )) %>% 
  mutate(physical_force_oc_spray_used_flag = case_when(
    physical_force_oc_spray_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_oc_spray_used_flag)
  )) %>% 
  mutate(physical_force_other_flag = case_when(
    physical_force_other_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_other_flag)
  )) %>% 
  mutate(physical_force_restraint_used_flag = case_when(
    physical_force_restraint_used_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_restraint_used_flag)
  )) %>% 
  mutate(physical_force_verbal_instruction_flag = case_when(
    physical_force_verbal_instruction_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_verbal_instruction_flag)
  )) %>% 
  mutate(physical_force_weapon_impact_flag = case_when(
    physical_force_weapon_impact_flag == "(null)" ~ "N",
    TRUE ~ as.character(physical_force_weapon_impact_flag)
  )) %>% 
  mutate(suspects_actions_drug_transactions_flag = case_when(
    suspects_actions_drug_transactions_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_drug_transactions_flag)
  )) %>% 
  mutate(suspects_actions_proximity_to_scene_flag = case_when(
    suspects_actions_proximity_to_scene_flag == "(null)" ~ "N",
    TRUE ~ as.character(suspects_actions_proximity_to_scene_flag)
  )) %>% 
  mutate(other_contraband_flag = case_when(
    other_contraband_flag == "(null)" ~ "N",
    TRUE ~ as.character(other_contraband_flag)
  )) %>% 
  mutate(firearm_flag = case_when(
    firearm_flag == "(null)" ~ "N",
    TRUE ~ as.character(firearm_flag)
  )) %>% 
  mutate(knife_cutter_flag = case_when(
    knife_cutter_flag == "(null)" ~ "N",
    TRUE ~ as.character(knife_cutter_flag)
  )) %>% 
  mutate(weapon_found_flag = case_when(
    weapon_found_flag == "(null)" ~ "N",
    weapon_found_flag == "(" ~ "N",
    TRUE ~ as.character(weapon_found_flag)
  )) %>% 
  mutate(suspect_arrest_offense = case_when(
    suspect_arrest_offense == "(null)" ~ "NO ARREST",
    TRUE ~ as.character(suspect_arrest_offense)
  )) %>% 
  mutate(stop_location_boro_name = case_when(
    stop_location_boro_name == "(null)" ~ NA_character_,
    stop_location_boro_name == "STATEN IS" ~ "STATEN ISLAND",
    stop_location_boro_name == "PBBX" ~ "BRONX",
    stop_location_boro_name == "PBBN" ~ "BROOKLYN",
    stop_location_boro_name == "PBMN" ~ "MANHATTAN",
    stop_location_boro_name == "0208760" ~ NA_character_,
    stop_location_boro_name == "0190241" ~ NA_character_,
    stop_location_boro_name == "0986759" ~ NA_character_,
    stop_location_boro_name == "PBMS" ~ "MANHATTAN",
    stop_location_boro_name == "0210334" ~ NA_character_,
    stop_location_boro_name == "PBSI" ~ "STATEN ISLAND",
    stop_location_boro_name == "0237177" ~ NA_character_,
    stop_location_boro_name == "PBBS" ~ "BROOKLYN",
    stop_location_boro_name == "0155070" ~ NA_character_,
    stop_location_boro_name == "0208169" ~ NA_character_,
    TRUE ~ as.character(stop_location_boro_name)
  )) %>% 
  mutate(stop_location_precinct = case_when(
    stop_location_precinct == "(null)" ~ NA_integer_,
    stop_location_precinct == 208760 ~ NA_integer_,
    TRUE ~ as.integer(stop_location_precinct)
  )) %>% 
  mutate(demeanor_of_person_stopped = case_when(
    demeanor_of_person_stopped == "IRRATE" ~ "IRATE",
    demeanor_of_person_stopped == "1" ~ NA_character_,
    demeanor_of_person_stopped == "NEVEVOUS" ~ "NERVOUS",
    demeanor_of_person_stopped == 1 ~ NA_character_,
    demeanor_of_person_stopped == "N/A" ~ NA_character_,
    TRUE ~ as.character(demeanor_of_person_stopped)
  )) %>% 
  mutate(suspect_reported_age = case_when(
    suspect_reported_age == "(null)" ~ NA_character_,
    suspect_reported_age == "0" ~ NA_character_,
    suspect_reported_age == "1" ~ NA_character_,
    TRUE ~ as.character(suspect_reported_age)
  )) %>% 
  mutate(suspect_reported_age = as.integer(suspect_reported_age))

Due to the nature of the dataset, there had to be some data cleaning conducted in order to organize some of the columns:

Exploratory Data Analysis (EDA)

The cleaned and updated dataset includes 34,784 observations ommiting the stops conducting on people that were not described as Black, White, Black Hispanic, or White Hispanic.

Code
saf_sub %>% 
  count(suspect_race_description) %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_race_description, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Race",
       title = "Number of Stops by Race") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

Code
saf_sub %>% 
  count(suspect_sex) %>% 
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(suspect_sex, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Sex",
       title = "Number of Stops by Sex") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

Code
d1 <- saf_sub %>% 
  count(day2) %>% 
  ggplot(aes(x = n, y = fct_reorder(day2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Weekday",
       title = "Number of Stops by  Weekday") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

m1 <- saf_sub %>% 
  count(month2) %>% 
  ggplot(aes(x = n, y = fct_reorder(month2, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month",
       title = "Number of Stops by Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

y1 <- saf_sub %>% 
  count(year2) %>% 
  mutate(year = fct_reorder(factor(year2), n)) %>% 
  ggplot(aes(x = n, y = year)) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Month",
       title = "Number of Stops by Month") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))
(d1 + m1) / y1

Code
saf_sub %>% 
  drop_na(suspect_reported_age) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  ggplot(aes(x = suspect_reported_age)) +
  geom_histogram(fill = "#7dc8c4",color = "white", 
                 binwidth = 5) +
  scale_x_continuous(breaks = seq(10, 80, 10)) +
  labs(x = "Age",
       y = "Number of Stops",
       title = "Number of Stops by Month") +
  theme_minimal() +
  theme(panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.y = element_line(color = "grey27"),
        panel.grid.minor.y = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

Code
saf_sub %>% 
  count(stop_location_boro_name) %>%
  drop_na() %>% 
  ggplot(aes(x = n, y = fct_reorder(stop_location_boro_name, n))) +
  geom_col(fill = "#7dc8c4") +
  labs(x = "Number of Stops",
       y = "Borough",
       title = "Number of Stops by Borough") +
  theme_minimal() +
  theme(panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        panel.grid.major.x = element_line(color = "grey27"),
        panel.grid.minor.x = element_line(color = "grey27"),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

Code
plotly1 <- saf_sub %>% 
  drop_na(suspect_reported_age, suspect_race_description) %>% 
  filter(suspect_reported_age >= 10 & 
           suspect_reported_age <= 80) %>% 
  count(suspect_race_description, suspect_reported_age) %>% 
  ggplot(aes(x = suspect_race_description, y = suspect_reported_age)) +
  geom_tile(aes(fill = n)) +
  scale_fill_viridis_c(option = "cividis") +
  scale_y_continuous(breaks = seq(10, 80, 5)) +
  labs(x = "Race") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_text(size = 11,
                                   face = "bold"),
        axis.title.x = element_text(size = 14,
                                   face = "bold"),
        axis.title.y = element_blank(),
        plot.background = element_rect(colour = "#e7eaea",
                                       fill = "#e7eaea"))

ggplotly(plotly1)
Code
saf_sub %>% 
  count(suspect_race_description, suspected_crime_description) %>% 
  pivot_wider(names_from = suspect_race_description,
              values_from = n) %>% 
  gt() %>% 
  tab_header(
    title = "Race and Alleged Crime Description",
    subtitle = "Data from 2017-2019"
  )  %>% 
  data_color(
    columns = vars(BLACK,`BLACK HISPANIC`,
                   WHITE,`WHITE HISPANIC`),
    colors = scales::col_numeric(
      palette = c(
        "lightskyblue", "dodgerblue", "royalblue4") ,
      domain = NULL
        )
      ) %>% 
  tab_style(
    style = list(
      cell_text(style = "oblique"),
      cell_borders(
        side = c("right"), 
        color = "black",
        weight = px(2)
        )
    ),
    locations = cells_body(
      columns = 1
    )
    )
Race and Alleged Crime Description
Data from 2017-2019
suspected_crime_description BLACK BLACK HISPANIC WHITE WHITE HISPANIC
ASSAULT 2592 417 342 1021
AUTO STRIPPIG 67 19 17 47
BURGLARY 1213 215 458 631
CPSP 96 19 26 54
CPW 6205 872 551 1948
CRIMINAL MISCHIEF 342 45 82 150
CRIMINAL POSSESSION OF CONTROLLED SUBSTANCE 124 10 69 62
CRIMINAL POSSESSION OF FORGED INSTRUMENT 34 4 3 6
CRIMINAL POSSESSION OF MARIHUANA 365 61 24 166
CRIMINAL SALE OF CONTROLLED SUBSTANCE 115 25 47 80
CRIMINAL SALE OF MARIHUANA 57 9 6 23
CRIMINAL TRESPASS 1020 160 202 482
FORCIBLE TOUCHING 59 7 16 35
GRAND LARCENY 895 152 115 273
GRAND LARCENY AUTO 577 94 136 271
MAKING GRAFFITI 36 15 54 68
MENACING 401 67 56 163
MISD 1 2 NA 1
MISDEMEANOR 1 2 NA 1
MURDER 46 7 4 21
OTHER 871 130 152 321
PETIT LARCENY 1870 216 449 607
PROSTITUTION 13 1 3 8
RAPE 43 8 8 15
RECKLESS ENDANGERMENT 142 18 7 44
ROBBERY 3415 500 290 1104
TERRORISM 7 1 17 3
THEFT OF SERVICES 96 8 25 30
UNAUTHORIZED USE OF A VEHICLE 114 18 106 88
FELONY NA NA 1 NA